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ABSTRACT 

This manuscript proposes a posterior mean (PM) super-resolution 
(SR) metliod with a compound Gaussian Markov random field 
(MRF) prior. SR is a technique to estimate a spatially high- 
resolution image from observed multiple low-resolution images. 
A compound Gaussian MRF model provides a preferable prior for 
natural images that preserves edges. PM is the optimal estimator 
for the objective function of peak signal-to-noise ratio (PSNR). 
This estimator is numerically determined by using variational Bayes 
(VB). We then solve the conjugate prior problem on VB and the 
exponential-order calculation cost problem of a compound Gaussian 
MRF prior with simple Taylor approximations. In experiments, the 
proposed method roughly overcomes existing methods. 

Index Terms — super-resolution, fully Bayesian approach, 
Markov random field prior, variational Bayes, Taylor approxima- 
tion. 

1. INTRODUCTION 

Super-resolution (SR) is a promising technology that is expected to 
be applied to microscope time series images, satellite photographs, 
and so on. SR aims at reconstructing a spatially high-resolution (HR) 
image from multiple low-resolution (LR) images. When multiple LR 
images obtained for the same object have transformation between 
images, they will contain mutually complementaiy information, and 
it becomes possible to estimate the HR image. Since the earliest 
work 1 1 1, SR has been realized through various methods. 

In this work, we handle SR in a Bayesian framework J2], ||3]. 
In a Bayesian framework, what prior we use is quite important. For 
example, various Markov random field (MRF) priors PI- llOl , the 
total variation (TV) prior fill, fl2 1, the Huber prior [131, and patch- 
based priors |14| have been used in image processing. These can 
represent image properties well and have good performance in SR, 
image restoration, and other applications. 

As the SR estimator, we think posterior mean (PM) is suitable 
as an HR image estimator because we usually evaluate the accuracy 
of SR methods by L2-norm (mean square error) -based peak signal- 
to-noise ratio (PSNR), and PM is the optimal estimator when em- 
ploying PSNR as the objective function. To determine the exact PM 
of the HR image, all parameters other than the HR image should be 
marginalized out over the joint posterior distribution without using 
any point estimation. According to this meaning, the previous meth- 
ods |2|-|5|, nil . 1121 are not optimal. Recently, the PM approach 
was proposed |10| . It was the first method to employ the PM as the 
HR image estimator and does not use any point estimation of the 
registration parameters or the model parameters. 

In this manuscript, we propose a new SR method that employs 
a compound Gaussian MRF prior that can simultaneously represent 



smoothness and edge of image f6l and utilizes variational Bayes to 
calculate the optimal estimator, PM, with respect to the objective 
function of the PSNR. This approach seems quite favorable, but pos- 
sibly it was not proposed earlier because of an important limitation 
of variational Bayes that a conjugate prior is needed, and because 
of the exponential-order calculation cost for a compound Gaussian 
MRF prior. We solve these problems through simple Taylor approxi- 
mations. In Section 2, we show the formulation regarding the models 
and the estimator. In Section 3, we evaluate the proposed method. In 
Section 4, we discuss. 



2. FORMULATION 



2.1. Notation 



First, we show the definitions of the gamma, Bernoulli and Gaussian 
distributions, the Kullback-Leibler (KL) divergence from distribu- 
tion p{x) to q{x), and PSNR, used in this manuscript: 
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Here, F is the gamma function, | • | denotes the determinant of a 
given matrix, d is the dimension of x, and the angle brackets (•)o 
denote the expectation of • with respect to a distribution o. Also, 
diag denotes a diagonal matrix. All the vectors in this manuscript 
are column vectors. Here, these variables have absolutely nothing to 
do with the variables that appear later. 



2.2. Observation model 

We estimate an HR grayscale image x G VJ^'° from observed mul- 
tiple LR grayscale images Y = {yi}fLi,yi £ T?.^" using an SR 
technique. The images yi and x are regarded as lexicographically 
stacked vectors. The number of pixels for each LR image is assumed 
to be less than that of the HR image; i.e., Ny < Nai- Although we 
define the range of a pixel luminance value as infinite, we use —1 
for black, +1 for white, and values between —1 and +1 for shades 
of gray. The HR image x is geometrically warped, blurred, down- 
sampled, and corrupted by noise e; to form the observed LR image 



yi . It is modeled as 



Vi = W{(j>i)x + eu p(Y\x,l3,^) = Y[U{yi-W{ct,i)x,r^I), (1) 

* = {</),}/li, <Pi = [0i,fc]Li = [Ou [3i]., [3i]y,-iiV. (2) 

The ei G VJ^'" is additive white Gaussian noise (AWGN) with preci- 
sion (inverse variance) /3 (> 0). W{4>i) is the transformation matrix 
that simultaneously applies warping, blurring, and downsampling. 
The details of this matrix are given in 1 10|. 4>i is a four-dimensional 
vector consisting of the registration parameters: rotational motion 
parameter 9i, translational motion parameter o;, and blurring param- 
eter 7;. 



2.3. Prior Distributions 

We use a compound Gaussian MRF prior for the HR image and the 
latent variables r] representing the edges, called a line process, that 
is known to be favorable for natural images. It is a compounded 
distribution of the Gaussian MRF model and the line process pro- 
posed by [6J, which is widely used |7|-|9| and can simultaneously 
represent smoothness and discontinuity of the image. It is defined as 
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M'n,p,i^)i,3 = { -pvij, « ~ j, (4) 

0, otherwise. 



one only x has a Markov property. However, in Eq. (|3), ignoring 
In I A| as in f4l, fSl makes them take the same form and breaks either 
property. Therefore, in Section 3, we propose a new approximation 
that does not ignore In |A|. 

The hyperparameter priors and the registration parameter priors 
are defined as 

p{\,p,K,,l3) = Gamma(A; a\^ , h^ ) Gamma(p; a^ , bp ) 

X Gamma(«:; aJt , feJt') Gamma(/3; a^ , 6)3), (5) 



p{^)^l[^{4>v,f^Z,-El), 



(6) 



where a. 



3 X 10-^ &f ,a«, &?',««, feP, 4 
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fi^l = [0, 0, 0, 12/ a^] and S^', = diag[10-^ 10°, 10°, 10"^]. For 
a gamma distribution, the number of effective prior observations in 
the Bayesian framework is equal to two times parameter a. The 
above settings are considered sufficiently non-informative. Here, we 
assume A > by setting its prior according to a gamma distribution 
similar to [10], resulting in an appropriate inference. We define the 

mean value for the gamma distribution as fi]l = -^,Pp = -^, 



Pk 



,® 



^, and fiJ = -^. t denotes an iterative step later introduced 

through variational Bayes. The settings of the registration parameter 
priors are considered suitable for this SR task 0101 . Note that the 
mean value p!\\ of the 7; prior is derived as the value equivalent to 
the anti-aliasing of the scale factor 1 10|. 

2.4. Objective function and optimal estimator 

First, we confirm that the joint distribution of all random variables 
can now be explicitly given as 

p{Y, z) = p{Y\x, (3, *)p(x, ?7|A, p, k)p{\, p, k, /?>(*), (7) 
z = [x,r],[X,p,K,,p],^]. (8) 

We define the objective function using PSNR(a;(l^); a;) and opti- 
mal estimator as 



The summation Ei^? '^ taken over all pairs of adjacent pix- 
els. The notation i ^ j means that the i-th and the j-th pix- 
els are adjacent in upward, downward, leftward, or rightward 
directions. The line process r) consists of binary latent vari- 
ables Tji^j G {0, 1} for all adjacent pixel pairs i and j. Its size 
equals A^'t, = iN^ — [number of HR image's horizontal pixels] — 
[number of HR image's vertical pixels]. The hyperparameter A 
(> 0) is an edge-penalty parameter which prevents rii.j from exces- 
sively taking edges. Also, p (> 0) is a smoothness parameter which 
prevents differences in adjacent pixel luminance from becoming 
large, and k, (> 0) is a contrast parameter which prevents x from 
taking an improperly large absolute value. 

Here, the "causal" Gaussian MRF prior used in 0, ||5], II 101 
is defined as the joint distribution of x and ri in the form of 
p('ri)p(x\ri) , and it differs from the compound one in that it is 
not simultaneously normalized about both x and rj like Eq. (O. A 
"causal" one is an approximation of the compound one, and it is 
easier to use than the compound one because simultaneous normal- 
ization of a compound one has an exponential-order calculation cost 
with respect to the dimensionality of the line process; the calculation 
cost of a "causal" one is polynomial. Additionally, though both x 
and Tj of a compound one has a Markov property, in the "causal" 
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Since only LR images, Y , are available for the estimator, we some- 
times explicitly express it as a function form, x{Y). We choose this 
objective function because we prefer good estimator performance on 
average over various HR images and the corresponding LR images. 
From Eq. ([9}, the PM (a;)p(^|y) is the best estimator of the HR 
image. Note that the posterior distribution of an HR image p{x\Y) 
needs marginalization of all parameters otherthana;overp(2|'K). 

2.5. Approximation methods 

As stated above, we could derive the optimal estimator. However, 
we cannot obtain the marginalized posterior distribution p(x\Y) an- 
alytically. Therefore, we solve this through approximation by using 
variational Bayes 1 15|. We impose the factorization assumption on 
the trial distribution q{z) = q{x)q{ri)q{X,p,ii,l3)q{^). The optimal 
trial distribution is identified by minimizing the KL divergence be- 
tween the trial and the true distributions as the best approximation 



of the true distribution: q{z) = si,TginmDK'L{q{z)\\p{z\Y)). In the 
common style of variational Bayes 1111 . 1161 , update equations are 



JO), 



q'">{z,)=p{z,), g^^-^^^(zO«exiXMz|r)>n^.^^,(t,(,^). (10) 

Frequently, the application of variational Bayes is difficult in practice 
because it requires a conjugate prior We make the priors conjugate 
by using simple Taylor approximations similar to 1101 . We also solve 
the exponential-order calculation cost problem of a compound Gaus- 
sian MRF prior by using the same approach here. These approxima- 
tions enable the analytical calculations in Eq. l llOt . Specifically, 
we apply the first-order Taylor approximations for three non -linear 
terms. First, in Eq. ([T), W{(pi) is approximated around (pi — fiJ 
the same as in flOl-flZI. Second, the logarithm of the normaliza- 
tion term of Eq. (|3) is approximated around In A — In /i^ . Finally, 
in Eq. l[3), In |^('7j P, i^) \ is approximated around [rj, In p, In k] = 
[/i|^^' , In p$ , In ^f ] similar to 1 10 1. Note that in 1 10 1, the third ap- 
proximation is the key idea to solve the conjugate prior problem. In 
this work, we found it also enables us to solve the exponential-order 
calculation cost problem of a compound Gaussian MRF prior. This 
makes it possible to calculate the normalization term. 

2.6. Algorithm 

From Eq. JlOt and the Taylor approximations, the trial distributions 
are obtained as the following distributions: 

q^\m) = Y[^ernov\\\{T^,,j-p.l), g®(x) ^M^^; mI,S|), (U) 

q {\, p,K,, P) = Gainina(A; a\^ , b^ ) Gamma(p; af , bf ) 

X Gamma(K;a®,6®)Gamma(/3;a^\6|), (12) 



g<*'(*) = nA^(<^';MtS^ 



(13) 



The specific form of the trial distributions is omitted because of 
space limitations. For Eq. JlOt , we do the following. First, we 
compute q'^'^^' {t]) using g® (a;, A, p, k,/?, $). Second, we com- 
pute q^*^^''{x) using g'^*^"'"' (T7)g® (A, p, k, /?, $). Last, we com- 
pute g(*+i) (A, p, K, /3) using q^+^\x,r])qf'>{^) and g(*+^)(*) us- 
ing g''"*'^'(a;,T7)g®(A,p, «:, /3). For the initial parameters of the 
trial distributions of rj and x, we use non-informative values, 
/i,y = 0, fij = 0, Sy = 0. As the initial parameters for A, 
p, /3, K, and $ we use the same values as their prior's values in Eq. 
^, ([6}. We obtain the well approximated PM of x as /i^. , for 
which the following convergence conditions hold for /i 



(t+l) 



and each 
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(t+l) 
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^t 



(a'-'-^^' - u® )' 



< 10" 



(fc = 1,2,3,4), 



(14) 



(=1 ^"<^J'= 

where we defined crj = [10~^, 10°, lO", 10~^] as the scaling con- 
stant. 

3. EXPERIMENTAL RESULTS 

We evaluated the proposed method using five gray-scale images with 
a size of 40 x 40 pixels as shown in Fig. [T] From each image, L = 




Fig. 1. Five original images used in the experiments. From left to 
right: Lena, Cameraman, Pepper, Clock, Text. 





Fig. 2. Observed images when warped, blurred, downsampled by an 
enhancement factor of 4, and noised with SNR= 30 dB AWGN 



10 images with a size of 10 x 10 pixels were created by using Eq. 
([TJ. The resolution enhancement factor was 4. The transformation 
parameter $ was generated from the prior distribution in Eq. (|6], 
where it is similar to that in previous work 121. 141. [51. 11101 - 1131 . The 
noise level parameter /3 was set for a signal-to-noise ratio (SNR) of 
20, 25, and 30 dB for each image. Samples of the created images 
are shown in Fig. |2] Figure |3] shows the estimated images under 
SNR= 30dB. We can see that the resolution of each image appeared 
to be better than that of the corresponding observed image. 

Table[T]shows the quantitative results compared to methods (a), 
(b), (c), and (d), where (a) is bilinear interpolation, (b) is the vari- 
ational EM approach with a causal Gaussian MRF prior using the 
maximum a posteriori (MAP) estimator |4|, (c) is the variational 
Bayes approach with a TV prior using the MAP estimator flll|, 
and (d) is the PM approach with a causal Gaussian MRF prior us- 
ing the PM estimator 1 10 1. Note that we added a slight modifica- 
tion to methods (b) and (c), the same as in |10|, because they em- 
ploy slightly different models. Over 100 experiments on each image 
and for each SNR, we evaluated the results with regard to the ex- 
pectation and the standard deviation of the PSNR of the proposed 
method and the improvement in signal-to-noise ratio (ISNR) defined 
as, ISNR = PSNR(i; x) - PSNR(i; a;), where x is the image 
estimated by the proposed method, and x is the image estimated by 
the method to compare (i.e., (a), (b), (c), and (d) in these experi- 
ments). We see that the ISNRs of the images estimated by the pro- 
posed method were mostly better than those by the other methods. 
In the subjective visual comparison in Fig. 4, we also see that the 
edges are not overemphasized in the images estimated by the pro- 
posed method compared to those in the images estimated by other 
methods. 

The calculation times with the proposed method and with meth- 
ods (b), (c) and (d) for each estimate, using an Intel Core 17 2600 
processor, were almost the same, about 20 minutes. 

4. DISCUSSION 

In this manuscript, we have shown how we can adopt a compound 
Gaussian MRF prior for PM SR. We got good results for almost all 
images and noise conditions. In the comparison with method (c) and 
(d), the superiority of ISNR was fair and the inferiority of ISNR was 
small. Regarding these some unfavorable results in Table[T] we think 
that the reason is our numerical optimization method falls slightly 
short of optimization because the proposed method uses more ap- 
proximations than method (d) and (c). In addition, we consider the 
result compared to method (c) in the case of the Pepper image in 40 
dB noise is caused by unstable estimation of 7 and p, where method 
(c) fixed the value of 7 to the true expected value in our implemen- 



Table 1. PSNR of the proposed method and ISNRs against four previous methods for different images 



and SNR levels 



Image 


SNR [dB] 


PSNR (proposed) 


ISNR (a) 


ISNR (b) 


ISNR (c) 


ISNR (d) 


Lena 


20 


29.31 ±0.30 


5.45 ± 0.33 


0.67 ±0.34 


0.02 ±0.11 


0.05 ±0.01 




30 


32.15 ±0.36 


8.20 ± 0.37 


1.74 ± 0.34 


0.52 ±0.18 


0.10 ±0.20 




40 


34.19 ±0.60 


10.24 ±0.60 


3.21 ±0.53 


0.95 ±0.60 


1.49 ±0.77 


Cameraman 


20 


21.76 ±0.20 


4.13 ±0.21 


0.95 ±0.32 


-0.04 ±0.08 


-0.01 ±0.01 




30 


23.59 ±0.28 


5.92 ± 0.28 


1.56 ±0.32 


-0.01 ±0.11 


-0.06 ± 0.02 




40 


25.04 ± 0.41 


7.37 ± 0.42 


2.70 ± 0.30 


0.32 ±0.27 


-0.01 ±0.14 


Pepper 


20 


29.73 ±0.24 


3.68 ± 0.26 


0.09 ± 0.41 


0.23 ±0.10 


0.11 ±0.87 




30 


31.65 ±0.33 


5.51 ±0.33 


0.76 ± 0.48 


0.11 ±0.22 


0.35 ±0.33 




40 


32.23 ±0.51 


6.09 ±0.51 


1.11 ±0.45 


-0.17 ±0.48 


0.93 ±0.56 


Clock 


20 


23.29 ±0.28 


5.38 ± 0.29 


1.40 ±0.23 


0.10 ±0.09 


0.01 ±0.01 




30 


25.42 ±0.29 


7.46 ± 0.29 


2.59 ± 0.30 


0.29 ±0.13 


-0.03 ±0.01 




40 


27.08 ± 0.38 


9.13 ±0.38 


4.00 ±0.31 


0.74 ±0.32 


-0.07 ±0.12 


Text 


20 


24.68 ± 0.32 


5.83 ± 0.33 


1.65 ±0.29 


-0.06 ± 0.09 


0.02 ±0.02 




30 


27.27 ±0.43 


8.38 ± 0.44 


3.09 ± 0.41 


0.19 ±0.18 


-0.03 ± 0.04 




40 


29.28 ±0.62 


10.39 ±0.62 


4.85 ±0.51 


0.78 ±0.51 


1.98 ±0.69 



r^E^ 



k\^TZ 



Fig. 3. Images estimated from the Fig. |2]observed images 




Fig. 4. Comparison of estimated images with the proposed method 
and methods (a), (b), (c), and (d) under SNR= 30dB 

tation same to 1 10|. Therefore, we think compound Gaussian MRF 
prior is considered preferable to a "causal" Gaussian MRF prior as a 
natural image prior. 

Regarding the estimator, we used the optimal estimator, the PM. 
From the experimental results, we see that the SR methods with the 
PM estimator (i.e., the proposed method and method (d)) were more 
accurate than the SR methods with other estimators (i.e., method (b) 
and method (c)). This indicates that PM is an optimal estimator. 

Regarding the calculation cost, our algorithm requires 0{N'^), 
and we must make our algorithm faster to apply this method to larger 
images. By using an idea similar to that of 1 10], 1 1 1 1, we have devel- 
oped a faster algorithm, but this algorithm causes obvious degrada- 
tion of accuracy. We continue to search for a more effective way to 
reduce the calculation cost with less degradation of accuracy. 

We can say that the proposed method is an SR method with a 
favorable model and an optimal estimator. In addition, the proposed 
method does not need any parameter tuning, which is a favorable 
property in practice. We think our approach to the problem about 
the conjugate prior and the exponential-order calculation cost can be 
applied to many other problems, and we will try to do this in our 
future work. 
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